<html>
<head><meta charset="utf-8"><title>DWARF .debug_line column and utf-8 · t-compiler/wg-llvm · Zulip Chat Archive</title></head>
<h2>Stream: <a href="https://rust-lang.github.io/zulip_archive/stream/187780-t-compiler/wg-llvm/index.html">t-compiler/wg-llvm</a></h2>
<h3>Topic: <a href="https://rust-lang.github.io/zulip_archive/stream/187780-t-compiler/wg-llvm/topic/DWARF.20.2Edebug_line.20column.20and.20utf-8.html">DWARF .debug_line column and utf-8</a></h3>

<hr>

<base href="https://rust-lang.zulipchat.com">

<head><link href="https://rust-lang.github.io/zulip_archive/style.css" rel="stylesheet"></head>

<a name="183586678"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/187780-t-compiler/wg-llvm/topic/DWARF%20.debug_line%20column%20and%20utf-8/near/183586678" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> bjorn3 <a href="https://rust-lang.github.io/zulip_archive/stream/187780-t-compiler/wg-llvm/topic/DWARF.20.2Edebug_line.20column.20and.20utf-8.html#183586678">(Dec 16 2019 at 20:23)</a>:</h4>
<p>While trying to optimize the .debug_line generation of cg_clif, I noticed that cg_llvm uses the <code>CharPos</code> as column number in .debug_line column field. I tried to find if this is the correct value, or that a <code>BytePos</code> should be used instead, but the DWARF specification doesn't seem to specify which one is the correct one.</p>



<a name="183586691"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/187780-t-compiler/wg-llvm/topic/DWARF%20.debug_line%20column%20and%20utf-8/near/183586691" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> bjorn3 <a href="https://rust-lang.github.io/zulip_archive/stream/187780-t-compiler/wg-llvm/topic/DWARF.20.2Edebug_line.20column.20and.20utf-8.html#183586691">(Dec 16 2019 at 20:23)</a>:</h4>
<p>If it is the <code>BytePos</code>, that could speed up .debug_line generation in cg_clif and maybe cg_llvm by a few percent, as there is no need to conversion anymore.</p>



<a name="183588611"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/187780-t-compiler/wg-llvm/topic/DWARF%20.debug_line%20column%20and%20utf-8/near/183588611" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> cuviper <a href="https://rust-lang.github.io/zulip_archive/stream/187780-t-compiler/wg-llvm/topic/DWARF.20.2Edebug_line.20column.20and.20utf-8.html#183588611">(Dec 16 2019 at 20:44)</a>:</h4>
<p>good question!</p>



<a name="183588692"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/187780-t-compiler/wg-llvm/topic/DWARF%20.debug_line%20column%20and%20utf-8/near/183588692" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> cuviper <a href="https://rust-lang.github.io/zulip_archive/stream/187780-t-compiler/wg-llvm/topic/DWARF.20.2Edebug_line.20column.20and.20utf-8.html#183588692">(Dec 16 2019 at 20:45)</a>:</h4>
<p>in the absence of an answer in the standard, which I don't see either, I'd look at what other producers like clang do, or how consumers like gdb apply it</p>



<a name="183590002"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/187780-t-compiler/wg-llvm/topic/DWARF%20.debug_line%20column%20and%20utf-8/near/183590002" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> bjorn3 <a href="https://rust-lang.github.io/zulip_archive/stream/187780-t-compiler/wg-llvm/topic/DWARF.20.2Edebug_line.20column.20and.20utf-8.html#183590002">(Dec 16 2019 at 21:00)</a>:</h4>
<h1>Clang</h1>
<p><a href="https://github.com/llvm/llvm-project/blob/ff07fc66d9eef577f3b44716f72e581a18cd9ac9/clang/lib/CodeGen/CGDebugInfo.cpp#L487" target="_blank" title="https://github.com/llvm/llvm-project/blob/ff07fc66d9eef577f3b44716f72e581a18cd9ac9/clang/lib/CodeGen/CGDebugInfo.cpp#L487"><code>CGDebugInfo::getColumnNumber</code></a> calls <a href="https://github.com/llvm/llvm-project/blob/2946cd701067404b99c39fb29dc9c74bd7193eb3/clang/lib/Basic/SourceManager.cpp#L1413" target="_blank" title="https://github.com/llvm/llvm-project/blob/2946cd701067404b99c39fb29dc9c74bd7193eb3/clang/lib/Basic/SourceManager.cpp#L1413"><code>SourceManager::getPresumedLoc</code></a> to get the <code>PresumedLoc</code> which contains the column number. <code>SourceManager::getPresumedLoc</code> calls <a href="https://github.com/llvm/llvm-project/blob/2946cd701067404b99c39fb29dc9c74bd7193eb3/clang/lib/Basic/SourceManager.cpp#L1117" target="_blank" title="https://github.com/llvm/llvm-project/blob/2946cd701067404b99c39fb29dc9c74bd7193eb3/clang/lib/Basic/SourceManager.cpp#L1117"><code>SourceManager::getColumnNumber</code></a> which seems to use a byte position and not a character position.</p>



<a name="183592457"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/187780-t-compiler/wg-llvm/topic/DWARF%20.debug_line%20column%20and%20utf-8/near/183592457" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> bjorn3 <a href="https://rust-lang.github.io/zulip_archive/stream/187780-t-compiler/wg-llvm/topic/DWARF.20.2Edebug_line.20column.20and.20utf-8.html#183592457">(Dec 16 2019 at 21:25)</a>:</h4>
<h1>GCC</h1>
<p>Has a 21725 lines long parser file for C.</p>
<p><a href="https://github.com/gcc-mirror/gcc/blob/1d858c08136273ba1237770760330994a044ebd3/gcc/c/c-parser.c#L1754" target="_blank" title="https://github.com/gcc-mirror/gcc/blob/1d858c08136273ba1237770760330994a044ebd3/gcc/c/c-parser.c#L1754"><code>add_debug_begin_stmt</code></a> is often called with a <code>location_t</code> from <a href="https://github.com/gcc-mirror/gcc/blob/1d858c08136273ba1237770760330994a044ebd3/gcc/c/c-parser.c#L470" target="_blank" title="https://github.com/gcc-mirror/gcc/blob/1d858c08136273ba1237770760330994a044ebd3/gcc/c/c-parser.c#L470"><code>c_parser_peek_token (parser)-&gt;location</code></a> which takes it from <a href="https://github.com/gcc-mirror/gcc/blob/1d858c08136273ba1237770760330994a044ebd3/gcc/c/c-parser.c#L264" target="_blank" title="https://github.com/gcc-mirror/gcc/blob/1d858c08136273ba1237770760330994a044ebd3/gcc/c/c-parser.c#L264"><code>c_lex_one_token</code></a> which in turn takes it from <a href="https://github.com/gcc-mirror/gcc/blob/03410c5ec328943fb2aeefd8963177f5bddd821d/gcc/c-family/c-lex.c#L390" target="_blank" title="https://github.com/gcc-mirror/gcc/blob/03410c5ec328943fb2aeefd8963177f5bddd821d/gcc/c-family/c-lex.c#L390"><code>c_lex_with_flags</code></a> which I think takes it from <a href="https://github.com/gcc-mirror/gcc/blob/33bb12b1ed4b88023f8076822807d648cab9c899/libcpp/macro.c#L2883" target="_blank" title="https://github.com/gcc-mirror/gcc/blob/33bb12b1ed4b88023f8076822807d648cab9c899/libcpp/macro.c#L2883"><code>cpp_get_token_with_location</code></a> which tail calls <a href="https://github.com/gcc-mirror/gcc/blob/33bb12b1ed4b88023f8076822807d648cab9c899/libcpp/macro.c#L2680" target="_blank" title="https://github.com/gcc-mirror/gcc/blob/33bb12b1ed4b88023f8076822807d648cab9c899/libcpp/macro.c#L2680"><code>cpp_token_get_1</code></a> which calls <a href="https://github.com/gcc-mirror/gcc/blob/7496b8810aacd28b1998ffe91e580e450fad2a1a/libcpp/lex.c#L2574" target="_blank" title="https://github.com/gcc-mirror/gcc/blob/7496b8810aacd28b1998ffe91e580e450fad2a1a/libcpp/lex.c#L2574"><code>_cpp_lex_token</code></a> which calls <a href="https://github.com/gcc-mirror/gcc/blob/7496b8810aacd28b1998ffe91e580e450fad2a1a/libcpp/lex.c#L2703" target="_blank" title="https://github.com/gcc-mirror/gcc/blob/7496b8810aacd28b1998ffe91e580e450fad2a1a/libcpp/lex.c#L2703"><code>_cpp_lex_direct</code></a> which finally uses the <a href="https://github.com/gcc-mirror/gcc/blob/6127fbba971b0096d7631ab96f25e07fe02e7662/libcpp/internal.h#L66" target="_blank" title="https://github.com/gcc-mirror/gcc/blob/6127fbba971b0096d7631ab96f25e07fe02e7662/libcpp/internal.h#L66"><code>CPP_BUF_COLUMN</code></a> macro, which uses a simple substraction of pointers, so it uses byte position too.</p>



<a name="183592492"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/187780-t-compiler/wg-llvm/topic/DWARF%20.debug_line%20column%20and%20utf-8/near/183592492" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> bjorn3 <a href="https://rust-lang.github.io/zulip_archive/stream/187780-t-compiler/wg-llvm/topic/DWARF.20.2Edebug_line.20column.20and.20utf-8.html#183592492">(Dec 16 2019 at 21:25)</a>:</h4>
<p>I will fill an issue to change cg_llvm to use byte positions too.</p>



<a name="183592795"></a>
<h4><a href="https://rust-lang.zulipchat.com#narrow/stream/187780-t-compiler/wg-llvm/topic/DWARF%20.debug_line%20column%20and%20utf-8/near/183592795" class="zl"><img src="https://rust-lang.github.io/zulip_archive/assets/img/zulip.svg" alt="view this post on Zulip" style="width:20px;height:20px;"></a> bjorn3 <a href="https://rust-lang.github.io/zulip_archive/stream/187780-t-compiler/wg-llvm/topic/DWARF.20.2Edebug_line.20column.20and.20utf-8.html#183592795">(Dec 16 2019 at 21:29)</a>:</h4>
<p><a href="https://github.com/rust-lang/rust/issues/67360" target="_blank" title="https://github.com/rust-lang/rust/issues/67360">https://github.com/rust-lang/rust/issues/67360</a></p>



<hr><p>Last updated: Aug 07 2021 at 22:04 UTC</p>
</html>